63 research outputs found

    Algorithms to Explore the Structure and Evolution of Biological Networks

    Get PDF
    High-throughput experimental protocols have revealed thousands of relationships amongst genes and proteins under various conditions. These putative associations are being aggressively mined to decipher the structural and functional architecture of the cell. One useful tool for exploring this data has been computational network analysis. In this thesis, we propose a collection of novel algorithms to explore the structure and evolution of large, noisy, and sparsely annotated biological networks. We first introduce two information-theoretic algorithms to extract interesting patterns and modules embedded in large graphs. The first, graph summarization, uses the minimum description length principle to find compressible parts of the graph. The second, VI-Cut, uses the variation of information to non-parametrically find groups of topologically cohesive and similarly annotated nodes in the network. We show that both algorithms find structure in biological data that is consistent with known biological processes, protein complexes, genetic diseases, and operational taxonomic units. We also propose several algorithms to systematically generate an ensemble of near-optimal network clusterings and show how these multiple views can be used together to identify clustering dynamics that any single solution approach would miss. To facilitate the study of ancient networks, we introduce a framework called ``network archaeology'') for reconstructing the node-by-node and edge-by-edge arrival history of a network. Starting with a present-day network, we apply a probabilistic growth model backwards in time to find high-likelihood previous states of the graph. This allows us to explore how interactions and modules may have evolved over time. In experiments with real-world social and biological networks, we find that our algorithms can recover significant features of ancestral networks that have long since disappeared. Our work is motivated by the need to understand large and complex biological systems that are being revealed to us by imperfect data. As data continues to pour in, we believe that computational network analysis will continue to be an essential tool towards this end

    A feedback control principle common to several biological and engineered systems

    Get PDF
    Feedback control is used by many distributed systems to optimize behaviour. Traditional feedback control algorithms spend significant resources to constantly sense and stabilize a continuous control variable of interest, such as vehicle speed for implementing cruise control, or body temperature for maintaining homeostasis. By contrast, discrete-event feedback (e.g. a server acknowledging when data are successfully transmitted, or a brief antennal interaction when an ant returns to the nest after successful foraging) can reduce costs associated with monitoring a continuous variable; however, optimizing behaviour in this setting requires alternative strategies. Here, we studied parallels between discrete-event feedback control strategies in biological and engineered systems. We found that two common engineering rules-additive-increase, upon positive feedback, and multiplicative-decrease, upon negative feedback, and multiplicative-increase multiplicative-decrease-are used by diverse biological systems, including for regulating foraging by harvester ant colonies, for maintaining cell-size homeostasis, and for synaptic learning and adaptation in neural circuits. These rules support several goals of these systems, including optimizing efficiency (i.e. using all available resources); splitting resources fairly among cooperating agents, or conversely, acquiring resources quickly among competing agents; and minimizing the latency of responses, especially when conditions change. We hypothesize that theoretical frameworks from distributed computing may offer new ways to analyse adaptation behaviour of biology systems, and in return, biological strategies may inspire new algorithms for discrete-event feedback control in engineering

    A neural theory for counting memories

    Get PDF
    Keeping track of the number of times different stimuli have been experienced is a critical computation for behavior. Here, we propose a theoretical two-layer neural circuit that stores counts of stimulus occurrence frequencies. This circuit implements a data structure, called a count sketch, that is commonly used in computer science to maintain item frequencies in streaming data. Our first model implements a count sketch using Hebbian synapses and outputs stimulus-specific frequencies. Our second model uses anti-Hebbian plasticity and only tracks frequencies within four count categories ("1-2-3-many"), which trades-off the number of categories that need to be distinguished with the potential ethological value of those categories. We show how both models can robustly track stimulus occurrence frequencies, thus expanding the traditional novelty-familiarity memory axis from binary to discrete with more than two possible values. Finally, we show that an implementation of the "1-2-3-many" count sketch exists in the insect mushroom body

    Network trade-offs and homeostasis in Arabidopsis shoot architectures.

    Get PDF
    Understanding the optimization objectives that shape shoot architectures remains a critical problem in plant biology. Here, we performed 3D scanning of 152 Arabidopsis shoot architectures, including wildtype and 10 mutant strains, and we uncovered a design principle that describes how architectures make trade-offs between competing objectives. First, we used graph-theoretic analysis to show that Arabidopsis shoot architectures strike a Pareto optimal that can be captured as maximizing performance in transporting nutrients and minimizing costs in building the architecture. Second, we identify small sets of genes that can be mutated to shift the weight prioritizing one objective over the other. Third, we show that this prioritization weight feature is significantly less variable across replicates of the same genotype compared to other common plant traits (e.g., number of rosette leaves, total volume occupied). This suggests that this feature is a robust descriptor of a genotype, and that local variability in structure may be compensated for globally in a homeostatic manner. Overall, our work provides a framework to understand optimization trade-offs made by shoot architectures and provides evidence that these trade-offs can be modified genetically, which may aid plant breeding and selection efforts.Gatsby Charitable Foundation Grant number GAT3272

    Alignment and clustering of phylogenetic markers - implications for microbial diversity studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Molecular studies of microbial diversity have provided many insights into the bacterial communities inhabiting the human body and the environment. A common first step in such studies is a survey of conserved marker genes (primarily 16S rRNA) to characterize the taxonomic composition and diversity of these communities. To date, however, there exists significant variability in analysis methods employed in these studies.</p> <p>Results</p> <p>Here we provide a critical assessment of current analysis methodologies that cluster sequences into operational taxonomic units (OTUs) and demonstrate that small changes in algorithm parameters can lead to significantly varying results. Our analysis provides strong evidence that the species-level diversity estimates produced using common OTU methodologies are inflated due to overly stringent parameter choices. We further describe an example of how semi-supervised clustering can produce OTUs that are more robust to changes in algorithm parameters.</p> <p>Conclusions</p> <p>Our results highlight the need for systematic and open evaluation of data analysis methodologies, especially as targeted 16S rRNA diversity studies are increasingly relying on high-throughput sequencing technologies. All data and results from our study are available through the JGI FAMeS website <url>http://fames.jgi-psf.org/</url>.</p

    A Statistical Growth Property of Plant Root Architectures.

    Get PDF
    Numerous types of biological branching networks, with varying shapes and sizes, are used to acquire and distribute resources. Here, we show that plant root and shoot architectures share a fundamental design property. We studied the spatial density function of plant architectures, which specifies the probability of finding a branch at each location in the 3-dimensional volume occupied by the plant. We analyzed 1645 root architectures from four species and discovered that the spatial density functions of all architectures are population-similar. This means that despite their apparent visual diversity, all of the roots studied share the same basic shape, aside from stretching and compression along orthogonal directions. Moreover, the spatial density of all architectures can be described as variations on a single underlying function: a Gaussian density truncated at a boundary of roughly three standard deviations. Thus, the root density of any architecture requires only four parameters to specify: the total mass of the architecture and the standard deviations of the Gaussian in the three (x, y, z) growth directions. Plant shoot architectures also follow this design form, suggesting that two basic plant transport systems may use similar growth strategies

    The power of protein interaction networks for associating genes with diseases

    Get PDF
    Motivation: Understanding the association between genetic diseases and their causal genes is an important problem concerning human health. With the recent influx of high-throughput data describing interactions between gene products, scientists have been provided a new avenue through which these associations can be inferred. Despite the recent interest in this problem, however, there is little understanding of the relative benefits and drawbacks underlying the proposed techniques

    Network Archaeology: Uncovering Ancient Networks from Present-day Interactions

    Get PDF
    Often questions arise about old or extinct networks. What proteins interacted in a long-extinct ancestor species of yeast? Who were the central players in the Last.fm social network 3 years ago? Our ability to answer such questions has been limited by the unavailability of past versions of networks. To overcome these limitations, we propose several algorithms for reconstructing a network's history of growth given only the network as it exists today and a generative model by which the network is believed to have evolved. Our likelihood-based method finds a probable previous state of the network by reversing the forward growth model. This approach retains node identities so that the history of individual nodes can be tracked. We apply these algorithms to uncover older, non-extant biological and social networks believed to have grown via several models, including duplication-mutation with complementarity, forest fire, and preferential attachment. Through experiments on both synthetic and real-world data, we find that our algorithms can estimate node arrival times, identify anchor nodes from which new nodes copy links, and can reveal significant features of networks that have long since disappeared.Comment: 16 pages, 10 figure
    corecore